🔗
Speech Recognition AI
22 October, 2020: By Ajoy Maitra

Speech Recognition is one of the intelligent tasks performed by a machine with the help of a software built to recognise such and transcribe them to a readable text format. In the present era of digital transformations redefining the work culture as well as the coporate growth, performing tasks in an intelligent manner suits to a better performance. With advanced integration of Artificial Intelligence, automated tasks require recogniton of the requirements to perform. Speech recognition plays a vital role in bringing together artificial intelligence close to common people as it can interact with you in the same way you would be sharing words with anyone.

Artificial Intelligence trainings involve feeds of various datasets that the machine recognises through algorithms in order to meet up with the common requirements. In modern generation, user experience of an application unlocks several paths of data collection with permission, wherein the user allows the use of microphone to record an audio to be saved or provide access to be stored in a database.
Machine Learning capabilities uses such available datasets to supervise training models for automated tasks to be performed on recognition. However, challenges arise when the speech are not clear or has background noises, making it difficult for the automated recognition system to decipher or transcribe the following. Another common issue which is recently been addressed is the recognition of a child's voice as they have a different way of speaking with broken words and pauses.




FACEBOOK AI


Facebook AI Speech Recognition

Facebook has recently launched an open-source model framework known as - wav2vec 2.0, that specialises in predicting the speech based on self-supervised Automatic Speech Rocognition (ASR) models with just 10 minutes of transcribed speech data noting a benchmark of detecting speech at a word error rate (WER) of 8.6%.

Speech Recognition used, also integrates background noises with the purpose of training data and bring down the WER to a considerable amount ensuring higher accuracy levels. However, with limited training data, accuracy is not possible to acheive and so Facebook decided to make it free to use with the open source code, so that the training data can fulfil its study on around 51 different languages and over 16000 hours of audio.


AMAZON ALEXA


Facebook AI Speech Recognition

Amazon has recently announced a 4th Gen Echo Smart Speaker bringing forth a major change in audio recognition for faster processing with on-device speech recognition. Using local machine learning, it would process faster locally before it sends back to the cloud thereby ensuring faster response rate. This also saves huge amount data send and stored in the clouds requiring even more processing power and memory bandwidth. Shehzad Mevawalla, the Director of Alexa Speech stated, as per the Amazon Blog post,

I think we've made huge strides in the last couple of years. For example, we can now run full-capability speech recognition on-device. The models that used to be many gigabytes in size, required huge amounts of memory, and ran on massive servers in the cloud - we're now able to take those models and shrink them into tiny footprints and fit them into devices that are no larger than a tin can.


Due to such advanced step adopted in Alexa speech recognition, neural networks can accept acoustic speech and directly output the transcribed speech, reducing response latency. Such advancement has been introduced to Alexa devices using Amazon Neural Processor - AZ1 which is optimized to run Deep Neural Networks (DNNs).